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Abstract 

Drug repurposing has become an increasingly attractive approach to drug development owing to the ever-growing cost of 
new drug discovery and frequent withdrawal of successful drugs caused by side effect issues. Here, we devised Functional 
Module Connectivity Map (FMCM) for the discovery of repurposed drug compounds for systems treatment of complex 
diseases, and applied it to colorectal adenocarcinoma. FMCM used multiple functional gene modules to query the 
Connectivity Map (CMap). The functional modules were built around hub genes identified, through a gene selection by 
trend-of-disease-progression (GSToP) procedure, from condition-specific gene-gene interaction networks constructed from 
sets of cohort gene expression microarrays. The candidate drug compounds were restricted to drugs exhibiting predicted 
minimal intracellular harmful side effects. We tested FMCM against the common practice of selecting drugs using a genomic 
signature represented by a single set of individual genes to query CMap (IGCM), and found FMCM to have higher 
robustness, accuracy, specificity, and reproducibility in identifying known anti-cancer agents. Among the 46 drug 
candidates selected by FMCM for colorectal adenocarcinoma treatment, 65% had literature support for association with 
anti-cancer activities, and 60% of the drugs predicted to have harmful effects on cancer had been reported to be associated 
with carcinogens/immune suppressors. Compounds were formed from the selected drug candidates where in each 
compound the component drugs collectively were beneficial to all the functional modules while no single component drug 
was harmful to any of the modules. In cell viability tests, we identified four candidate drugs: GW-8510, etacrynic acid, 
ginkgolide A, and 6-azathymine, as having high inhibitory activities against cancer cells. Through microarray experiments 
we confirmed the novel functional links predicted for three candidate drugs: phenoxybenzamine (broad effects), GW-8510 
(cell cycle), and imipenem (immune system). We believe FMCM can be usefully applied to repurposed drug discovery for 
systems treatment of other types of cancer and other complex diseases. 
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Introduction 

An important goal for biomedical research is to understand the 
underling genetic mechanisms of human diseases and discover 
therapeutic drugs for the diseases. Drug discovery is expensive; the 
average research and development (R&D) cost in the past 15 years 
for developing a new drug exceeds 1 billion US dollars [1]. Anti- 
cancer agents are especially costly [2]. The standard R&D 
procedure includes compound identification, toxicity tests in cell 
and animal models, safety evaluation on early clinical trials, and 
efficacy in late phase trials. The very high failure rate has led to a 
crisis known as innovation gap in new drug discovery [3]. The 
crisis is further compounded by the withdrawal of many previously 



thought successful drugs, mostly through issues related to harmful 
side effects [4-6] . Such issues may be a corollary of the prevailing 
method for new drug discovery, which is to find specific 
biomolecules as targets, typically membrane receptors [7]. 
However, a biological target may regulate more than one 
biological pathway, only one of which may be disease related. If 
this is the case, then altering the function of a biological target by a 
drug can lead to unintended results of disruption of healthy 
pathways [8]. 

The strategy of finding specific biological targets for drug 
treatment may have also contributed to the disappointing progress 
made in the last 40 years in reducing the overall mortality rates for 
most types of cancer [9] . This is partly because cancer is a disease 
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involving the dysfunction of multiple parallel pathways controlling 
many fundamental processes [10]. Cancer cells accumulate 
multiple genetic mutations that equip it with a of myriad of 
survival and death-avoiding capabilities: for inducing angiogenesis; 
to maintain proliferative signalling; to escape suicidal apoptotic 
programs; for enabling replicative immortality; and to activate 
invasion and metastasis [11]. Evidences are emerging that the 
pathology of cancer is more a consequence of small abnormalities 
on many genes, than a major abnormality on a single gene 
[12,13], and that drug compounds acting on multiple targets may 
be a more effective treatment strategy than a single drug on a 
single target [14]. In short, cancer is a systems disease and ought to 
be dealt with by a systems treatment [15]. 

Here, we present a computational drug-screening procedure 
that addresses the issues raised above. Our program has two main 
aims: to surmount the innovation gap through drug repurposing, 
and to find drug compounds for a systems treatment of cancer. 
Drug repurposing is the search for novel indications for already 
approved drugs [1,16]. Because an approved dug has already been 
optimized for safety and efficacy for its originally designed 
indication, its route for approval for a novel indication may be 
significandy shorter and likely far less costly than that for a new 
drug. 

Recently the computational screening drugs for repurposing has 
been greatly facilitated by the advent of the Connectivity Map 
(CMap), a comprehensive and continuously updated database of 
the genomic profiles of many existing drugs [17]. CMap provides a 
platform for utilizing a pattern-matching strategy to determine the 
similarity, or the opposite, in genomic signatures among diseases, 
functional gene sets, and drugs. It has been employed in many 
studies for discovering repurposing drugs against common 
diseases, including diabetes and Alzheimer's disease [17], and 
for treating solid tumours, including those associated with colon 
cancer [18], breast cancer [19], and lung adenocarcinoma [20]. 

The basic concept of CMap-based repurposing drugs discovery 
studies is the identification of disease associated genomic signatures 
that reversely correlate with perturbation in genomic signature 
associated with the administration of molecules or drugs 
[17,21,22]. In these studies, the essence of the protocol - the 
individual-gene CMap approach (IGMP) - for identifying drugs 
for treating a specific disease is straightforward: find a set of 
differentially expression genes (DEGs) obtained by, say, comparing 
two sets - controls and patients - of gene expression microarrays, 
score the match between the DEC set and genomic profiles of 
drugs given by CMap, and rank the drugs by score. Candidate 
drugs are those with the highest scores. Because it draws on the 
entire genomic information of the patients and of the drug, one 
may view this approach as an attempt at systems treatment. 
However, it suffers from being too crude an approach. In 
particular it makes no specific reference to any of the many 
altered states of biological functions associated with the disease. By 
not paying attention to individual biological functions, a "best" 
drug could very well be a compromise, chosen for having strong 
beneficial effects on a subset of functions at the expense of being 
harmful to some other functions. 

Another study that utilizes variable gene signatures to screen 
repurposed drugs has successfully identified many heterogeneous 
Food and Drug Administration (FDA) approved drug candidates 
for breast, myelogenous leukemia, and prostate cancer [23]. This 
method typically yields a long list of heterogeneous drug 
candidates without providing details that may help in differenti- 
ating the drugs, details such as how a drug differendy impact the 
multi-functionalities of (a specific) cancer. Other more sophisti- 
cated methods based on computational network models have been 



developed to identify novel therapeutic targets for the purpose of 
treating regulatory cellular networks [24,25]. The effectiveness of 
these approaches, which aim to elevate the relative activity of 
certain cell regulatory networks, and base their predictions on 
elaborate models optimally tuned to fit existing temporal and 
spatial data, may be restricted by the limited existing knowledge 
on networks and parameters describing protein activities. 

Here, we present a novel analytical framework, called Func- 
tional Module Connectivity Map (FMCM), for the discovery of 
drug compounds for systems cancer treatment. We constructed 
condition-specific function-function networks (FFNs) and applied a 
gene-selection-by-trend-of-progression procedure (GSToP) [26] to 
identify complexly connected and highly expressed hub genes in 
the FFNs. We then used functional modules constructed around 
the hub genes to query CMap for the discovery of ontology- 
specific repurposing drugs, and further screened the drugs by 
requiring that they exhibit minimum intracellular harmful side 
effects. Relative to the standard IGCM protocol, FMCM was 
more robust in its drug selection and it more consistendy predicted 
higher hit rates (~65%) on effective drugs against early 
tumorigenesis in colorectal cancer. When checked against known 
drug indications in Therapeutic Target Database (TTD), FMCM 
showed significantly higher accuracy and lower false positive rates 
on the discovery of the anti-cancer agents than IGCM, except for 
the immune system. Our viability tests on eight of the candidate 
drugs showed three, GW-8510, ginkgolide A, and 6-azathymine, 
represented high inhibitory activities against the survival of cancer 
cell lines with specific concentrations and administration dura- 
tions. Follow-up microarray experiments confirmed that both the 
CMap and our datasets showed consistent results on three 
independent drugs - phenoxybenzamine (broad effects), GW- 
8510 (cell cycle), and imipenem (immune system). These results 
demonstrated the effectiveness of FMCM, and suggested its 
potential for formulating repurposed drug regimes with minimum 
harmful side effects in cancer patients. 

Materials and Methods 

Data sources 

Gene expression data for 32 patients with sporadic colorectal 
polyps (adenoma) and corresponding adjacent normal mucosa 
from the same individuals were obtained from Gene Expression 
Omnibus (GEO) database (accession number: GSE8671) [27]. We 
extracted 30,047 protein entries and 39,194 protein-protein 
interactions (PPIs) from the Human Protein Reference Database 
(HPRD) [28] and used Gene Ontology (GO) [29] for functional 
information. 

External database 

We used the Connectivity Map database (CMap) build 02 [17], 
with 6,100 treatment expression profiles representing 1,309 drugs 
(and compounds), to compute enrichment scores (ES) of gene set 
against drugs. 

For reference on drug indication we used L01 class, antineo- 
plastic agents, Anatomical Therapeutic Chemical (ATC) Classifi- 
cation System, World Health Organization (WHO) (http:/ /www. 
whocc.no/). 

We extracted information on known therapeutic protein targets, 
relevant diseases or cancers, and corresponding drugs (787 drugs; 
60% of CMap datasets) from the Therapeutic Target Database 
(TTD: http://bidd.nus.edu.sg/group/ttd/) [30]. In addition, we 
queried key words on searching engines to define relative 
therapeutic drugs on cancer treatment. 
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We downloaded the annotated 4,884 gene-sets from the 
Molecular Signatures Database (MSigDB: http:/ /www. 
broadinstitute.org/gsea/msigdb/index.jsp) [31]. The gene-sets 
are of four types: C2: curated gene-sets from known pathways, 
online databases, and knowledge of domain experts; C3: motif 
gene-sets based on conservative cis-regulatory motifs from human, 
mouse, rat, and dog genomes; C4: computational gene-sets 
determined by co-expression neighbourhoods centered on 380 
cancer-related genes; C5: gene-ontology gene-sets collected from 
the same GO annotations of genes. Gene symbols in each gene-set 
were combined and converted into HG-U133A Affymetrix ID 
according to the updated annotation file at the website http:// 
www.affymetrix.com/ estore/. 

Gene selection by individual gene analysis (IGA) and 
Individual gene connectivity map (IGCM) 

Differentially expressed genes (DEGs) were selected using the 
Significance Analysis of Microarrays algorithm (SAM) [32]. Unless 
otherwise stated, threshold values for false discovery rate (FDR) 
<0.05 and fold change (FC) >2 were used. Enrichment score (ES) 
matching gene set to drug was computed through CMap [17]. 

Beneficial and harmful drugs 

Given a gene set, a drug was designated beneficial or harmful if 
the ES is < — 0.5 or >0.3. For drugs to be designated beneficial a 
randomization /)-value<0.005 was required, unless otherwise 
stated. 

Construction of gene-gene interaction network (GGIN) 
and function-function interaction network (FFN) 

For a given condition - control (Nor) or adenoma patient (Ade) 
- and a Pearson /j-value (see below) threshold po, we included a 
pair of genes in the GGIN if: (1) the Rvalue for the pair was not 
greater than p Q ; (2) the protein pair encoded by the gene pair was 
linked in the PPI data. For a given set of n microarrays, a Pearson's 
correlation coefficient (PCC) between a pair of genes was 
calculated using the two sets of n intensities of the pair. Each 
PPC is assigned a Pearson /(-value based on permutation tests and 
/-statistics. Genes in each type-specific GGIN were assigned to 
over-represented biological functions, called functional modules, 
through Gene Ontology term association [29]. Enrichment 
analyses based on conditional hypergeometric test [33] were 
made using the R package GOstats downloaded from the 
Bioconductor website. Each GGIN was reduced to function- 
function network (FFN) using functional modules as nodes. 

GSToP and the functional module connectivity map 
(FMCM) framework 

The FMCM framework for selecting therapeutic drug com- 
pound consisted of two segments, selection of functional modules 
of predicted cancer genes based on the GSToP procedure [26] 
(steps 1-5 below), and multiple queries, one for each functional 
module, of the CMap for drug identification (steps 6-8). Steps in 
the selection procedure (Figure 1) were: (1) Construct Nor and Ade 
GGINs and FFNs using threshold Pearson /j-value = 0.001. (2) 
SAM. Identify DEGs for Ade vs. Nor using thresholds FDR <0.01 
and FC >2. (3) GSToP. Assign a gene as a cancer gene if: (a) it 
appears in at least the Ade or Nor GGIN; (b) its degrees and 
clustering coefficients increase (decrease) along the sequence. (4) 
Take overlap of SAM and GSToP lists. (5) Cancer genes 
(including up-regulated and down-regulated genes) form function- 
al modules having GO terms used for the FFNs. (6) Beneficial and 
harmful drug lists. Use functional modules separately to query 



drugs in the CMap [17] to obtain for each function two lists 
respectively for predicted beneficial (ES < — 0.5) and harmful (ES 
>0.3) drugs (see above for requirement on randomization Rvalue). 

(7) Function-drug association map (FDAM). Use the drug lists to 
construct a map with two kinds of nodes, function module and 
drug, and two kinds of function-drug links, beneficial and harmful. 
Include in FDAM only drugs that have at least one beneficial link. 

(8) Construct from FDAM all predicted drug compounds, where a 
compound is minimum set of purely beneficial drugs that covered 
all functions. 

Accuracy, specificity, and reproducibility in performance 
tests 

Positives NB and negatives NA were drugs predicted to be 
beneficial and harmful, respectively; true positives TP and false 
negatives FN were known anti-tumor agents predicted to be 
beneficial and harmful, respectively; true negatives is TN = NA- 
FN, and false positive is FP = NB-TP. Accuracy was defined as 
(TP+TN) / (NB+NA) , and specificity as TN/(FP+TN). 

For reproducibility a drug prediction procedure (FMCM or 
IGCM) was repeated 10 times, each time working on a set of 40 
randomly chosen microarrays, 20 each from controls and patients, 
and reproducibility was measured over the 10x9/2 = 45 pairs of 
results. For each pair reproducibility was (the size of) the 
intersection of the two sets of selected drugs divided by the 
geometric mean of the two sets. 

Cell cultures and reagents 

Human colon cancer cell lines (HCT116, RKO, SW403, and 
SW620), and breast cancer cell lines (MCF7) were obtained from 
ATCC (American Type Culture Collection, Manassas, VA) and 
maintained as suggested by ATCC. The growth media for all cell 
lines were supplemented with 10% fetal bovine serum (FBS), 50 
units/ml of penicillin and streptomycin, and incubated at 37°C 
with 5% carbon dioxide. In experiments, cells were treated with 
ethanol, water or DMSO as corresponding vehicle control. 
Phenoxybenzamine, GW-8510, etacrynic acid, ginkgolide A, 
triflusal, imipenem, 6-azathymine were purchased from Santa 
cruz (CA). Phthalylsulfathiazole was purchased from Sigma (St. 
Louis, MO). Phenoxybenzamine, phthalylsulfathiazole, etacrynic 
acid, ginkgolide were dissolved in ethanol. Imipenem was dissolved 
in water. The remaining drugs were dissolved in Dimethyl 
sulfoxide (DMSO). 

Cell Proliferation assay 

Proliferation activities of five cell lines - colon cancer, HCT1 16, 
RKO, SW403, and SW620, and breast cancer, MCF7 - were 
monitored by Alamar Blue (Molecular Probes, Invitrogen Corpo- 
ration), an oxidation-reduction reagent, and determined by 
measuring the reduction of resazurin (blue, non-fluorescent) to 
resorufin (red, highly fluorescent). Cells were seeded in 96-well 
cultured plates and, following the study design of the CMap [17], 
treated with single drugs with concentration of 0, 0.1, 1,10, 30 uM 
for 5 days, then assayed for proliferation activities. One-tenth 
volume of alamar blue reagent was added and plates were 
incubated at 37°C for 2-3 hours. Cell viability was determined by 
measuring fluorescence with excitation at 550 nm and emission at 
590 nm on Synergy HT (BioTek Instruments, Winooski, VT). Cell 
survival was calculated as relative value of the difference between 
the reductions of Alamar Blue in treated versus controls. 
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Figure 1. Flowchart of methodology. 
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RNA extraction and microarrays 

Cells were seeded in 100 mm dishes and treated with drugs. 
After drug treatment for 6 hours, total cellular RNA was isolated 
using mirVana miRNA Isolation Kit (Ambion, Austin, TX) 
according to the manufacturer's instructions. 250 ng of total RNA 
was used for microarray experiments. Extracted RNA was labelled 
with GeneChip® 3' IVT Express Kit (Affymetrix, Santa Clara, 
CA, USA) and hybridized onto Affymetrix GeneChip® Prime 
View Human Gene Expression Array. This array contained 
approximately 530,000 probes covering more than 36,000 
transcripts and variants. Raw images were analysed by Affymetrix 
GeneChip® Operating Software. We performed microarray 
analysis of the effect of imipenem and phenoxybenzamine (PB) 
(treated and non-treated) on HCT1 16 and MCF7, and GW-8510 
on HCT116 under the U219 (primeview) platform, all in 
duplicates. Treatment dosages and duration times were the same 
as in [17]. 

Microarray experiments and analysis by IGA and gene-set 
approach (GSA) 

Genome-wide gene expression profiles from drug-perturbed 
tumour cells evaluated by the Affymetrix GeneChip® Prime View 



platform were analyzed in R environment (version 2.15.1). Two 
cell lines, HCT116 and MCF7, were treated with three drugs, 
GW-8510, phenoxybenzamine (PB), and imipenem, with the same 
dosages (10 uM, 11.8 uM, 13.4 uM, respectively) and time 
(6 hours after overnight culture) as in [17]. The microarray 
profiles were compared with ten profiles from the CMap for 
MCF7 treated with the three drugs. Gene expression intensities 
were normalized by Robust Multi-array Average (RMA) [34]. In 
the IGA approach DEGs were identified by one-way ANOVA 
using the eBayes function in the limma package [35] . In a gene-set 
approach (GSA), given a list of ranked differential gene 
expressions, we used GSEA [31] to convert the 4,884 annotated 
gene sets in MSigDB [3 1] to a list of 4,884 ranked ESs, then 
applied one-way ANOVA to find differential gene sets. In IGA (or 
GSA) a gene (or a gene-set) with false discovery rate (FDR) less 
than 0.01 was considered significant and selected for two-way 
hierarchical clustering of the microarray set. GO terms for 
overrepresented gene (or gene-set) clusters in the IGA (or GSA) 
heatmap were determined using DAVID [36] . 
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Results 

Function-function networks 

High quality of microarray data was indicated by the clean 
separation of control (from 32 normal tissues) and sample (32 
patients) in Principle Component Analysis (Figure SI A). The 
2,164 DEGs selected by SAM (with thresholds FDR <0.01 and 
FC >2) correctly classified the controls and sample in hierarchical 
clustering (Figure SIB). Clustering results were not sensitive to 
moderate variations in threshold values (not shown). Gene-gene 
interaction networks (GGINs) were constructed with a threshold of 
Pearson /i<0.001 from the control and adenoma cohort micro- 
array data (Figure S2). The adenoma GGIN has 6.4% more genes 
(1,792 vs. 1,684) and 32% more links (2,656 vs. 2,017) than the 
control GGIN. The difference between the two cases became 
evident when the GGINs were reduced to function-function 
networks (FFNs) having functional modules as nodes (Figure 2, 
Table SI). Cell cycle, DNA replication, and DNA repair 
functional modules were much larger in the adenoma FFN and 
exhibited much high levels of intra-function activity. There were 
also more inter-module activities in adenoma than in control. In a 
noted exception, the inter-module activity between immune 
system process and cell proliferation was weaker in adenoma than 
in control. 

Repurposed therapeutic drugs selected by IGCM 

CMap gives enrichment scores (ES) to gene lists not longer than 
1,000 entries. We complied with this constraint (i.e., restricting the 



size of the DEGs) by requiring FDR <0.01. Five DEG lists with 
FC thresholds of 3.0 to 5.0 with 0.5 intervals were generated and 
their ES's for the 1,308 drugs (or small compounds) were obtained 
by querying CMap. The list of beneficial (i.e., anti-adenoma) drugs 
was sensitive to (the threshold value of) FC, with the size of the list 
decreasing with increasing FC (Figure 3A). The number of 
beneficial validated drugs decreased with increasing FC (Figure 
S3). According to TTD, many known therapeutic anti-cancer 
drugs, such as chrysin (pink, TTD id: DNC004715), GW-8510 
(red, TTD id: DNC004631), daunorubicin (cyan, TTD id: 
DAP000788), apigenin (light purple, TTD id: DNC004714), 
resveratrol (yellow green, TTD id: DNC001205), coincidentally all 
changed from beneficial at FC = 3 to harmful at FC = 3.5 
(Figure 3A). At FC = 5.0, the most stringent threshold that we 
used mostly, AG-0 12559 was the only beneficial drug under 
permutation /><0.005 (Figure S3). 

Repurposed therapeutic drugs selected by FMCM 

In the FMCM program, genes selected in each functional 
module (Table S2) were used separately to query CMap, yielding 
separate functional specific drug lists. Each functional module was 
the union of the control and adenoma functional modules given by 
the respective FFNs, filtered by the GSToP procedure (see 
Methods). In FMCM the gene size of the module had a much 
stronger dependence on the value of FC than IGCM (Figure 3). In 
IGCM, size of DEG dropped from just above 600 to 200 when the 
value of FC threshold was raised from 3 to 5 . In FMCM module 
gene size dropped from about 600 to about 30 as (the threshold) 
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Figure 2. Function-function networks for colorectal adenoma. Condition specific function-function networks (FFNs) were generated from 
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corresponding GGIN forming over-represented Gene Ontology terms. FMs containing less than 70 genes are not shown. The diameter of a node 
scales with the logarithm of the number of genes in the node. The color shade of a node indicates the number of intra-node gene-gene interactions 
per gene. The thickness of the edge indicates the number of inter-node gene-gene interactions. 
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Figure 3. Enrichment score versus fold-change for CMap drugs. Enrichment score (ES) was obtained by querying the CMap with gene set 
(size indicated by vertical bar) determined using varying fold-change (FC) threshold. A drug is considered beneficial for the treatment for colorectal 
adenoma if ES <-0.5, harmful if ES >0.3, and neutral otherwise. (A) Screening by IGCM procedure. Querying gene set was complete set of 
differentially expressed genes (DEGs) identified from gene expression arrays of colorectal adenoma cohort (versus control) using the SAM algorithm 
with fixed FDR <0.01. (B) Screening by FMCM procedure. Querying gene sets were functional modules obtained by partition of over-represented 
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doi:1 0.1 371 /journal.pone.0086299.g003 



FC was raised from 1 to 3, and became too small for CMap 
application when FC was raised above 3. Even so, within the 
respective range of FC used, FMCM provided a much more stable 
and robust environment for drug screening by CMap than IGCM; 



in FMCM the character of selected drugs (i.e., beneficial or 
harmful) changed very little (21 out of 256, Figure 3B) while in 
IGCM changed occurred to 54.5% of selected drugs (12 out of 22, 
Figure 3A). 
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Function-drug association map (FDAM) and therapeutic 
drug compounds 

Purely beneficial and harmful drugs (see Material and Methods) 
that were beneficial to at least one FM were identified in the 
FMCM program (FC >2) and used to construct FDAM. The 46 
drugs in the FDAM (Table 1) were much more numerous than the 
corresponding list found in traditional IGCM approach (which 
had 22 drugs). Thirty of the 46, or 65%, have either been studied 
individually as anti-tumour agents or have been certified to have 
preventive effects against a broad range of cancers (Table 1 and 
Table S3). The five drugs, thapsigargin, pyrvinium, trifluopera- 
zine, ellipticine, and 029741 7-0002B, which in our FDAM were 
harmful to at least one module, have been reported to show 
evidence for carcinogenesis/immune suppression activities 
(Figure 4, Table 1 and Table S3). We view the 41 drugs on the 
FDAM with no harmful links as candidate therapeutic drugs. The 
number of modules, or degrees (Table 1), to which a candidate 
drug was beneficial varied from 1 to 7. There were two degree-7 
drugs, phenoxybenzamine and GW-8510, and three degree-5 
drugs, thapsigargin, phthalylsulfathiazole, and medrysone (see 
Table 1 for full details on degrees and drug-module relation). A 
therapeutic drug compound is a minimum set of drugs culled from 
the list of candidate therapeutic drugs that covered all the 
modules. Many compounds could be constructed from the 
candidate drug list. There were two 2-compnent compounds, 
phenoxybenzamine+ISP and GW-85 1 0+ISP, and 20 compounds 
with up to six drug components (Table 2). Barring drug-drug 
interaction, we predict these compounds to be free of harmful side 
effects at the intracellular level. 

Comparison between IGCM and FMCM 

Stability. As mentioned earlier, the characterization of a 
drug, namely beneficial or harmful, was much more stable FMCM 
than in IGCM (Figure 3). 

Accuracy and specificity. We used anti-tumor agents in the 
Therapeutic Target Database (TTD) to evaluate the accuracy and 
specificity (Material and Methods) of the FMCM and IGCM 
predictions. The "true" drug set in the test was the intersection of 
TTD and the CMap list, which included about 40% of TTD. For 
simplicity, we denote by IG3 the IGCM query at FC = 3 and the 
rest (queries with FC >3) by IGL. We found that FMCM had 
overall accuracy (Figure 5A) and specificity (Figure S4) similar to 
IG3 and higher than IGL, except for the immune system process 
module, where FMCM was worse than IGL. 

Reproducibility. We tested the reproducibility (Material and 
Methods) of the drug predictions by repeating 10 times the 
FMCM and IGCM procedures, each time working on a set of 40 
randomly chosen microarrays, 20 each from controls and patients. 
In the FMCM procedure, GGINs were constructed using DEGs 
selected by SAM at FDR <0.01 and FC >2, and the selected drug 
set was the sum of beneficial and harmful drugs. FMCM had 
significantly higher reproducibility (on average ~80%) than 
IGCM (on average ~50%), except for the module immune 
system process (on average ~60%) (Figure 5B, two-sample Hest 
/><10~ 15 ). 

Clinical application I. We took four known anti-cancer 
drugs, irinotecan (no. 29 in Table 1), thapsigargin (no. 3), 8- 
azaguanine (no. 15), and vorinostat (no. 18) as examples for 
comparison (Figure 6). The first drug, irinotecan (trade mark 
Camptosar), is in current use, in particular in combination with 
other chemotherapy agents such as 5-fluorouracil and leucovorin 
in a common colorectal cancer regimen called FOLFIRI [37], was 
significantly beneficial only to the apoptosis module by FMCM 
(Figure 6A). Thapsigargin, a known endoplasmic reticulum Ca2+ 



ATPase inhibitor [38], was both significantly beneficial and 
harmful to colorectal cancer in FMCM (apoptosis and RNA 
metabolic process, respectively, Figure 6B), but was neutral 
IGCM. The third example, 8-azaguanine, a purine analog that 
exhibits anti-neoplastic activity and have been used in the 
treatment of acute leukemia [39], was significantly beneficial to 
the apoptosis and cell proliferation modules in FMCM, but was 
harmful (not significantly) in all IGCM tests (Figure 6C). 
Vorinostat (trade mark Zolinza), a member of a larger class of 
compounds that inhibit histone deacetylases (HDACs), and known 
to arrest cancer cells epigenetically, was significantly and broadly 
beneficial in both IGCM (FC from 3 to 4.5) and FMCM (cell 
proliferation, signal transduction, and transcription) and showed 
no harmful effects (Figure 6D). These results suggested that 
FMCM have higher resolution to detect known ant-cancer agents 
than IGCM. 

Clinical application II. Using the IGCM and FMCM gene 
lists we examined the ES versus FC patterns of 27 chemo-drugs, 
not necessarily specific to colon cancer treatment, listed in the 
Anatomical Therapeutic Chemical (ATC) classification system 
(L01 class; anti-neoplastic agents), and partitioned the group by 
pattern into six types (Figure S5A-F). An overall characterization 
of our results was that IGL, which represent queries using more 
stringently selected DEGs, on the one hand and IG3/FMCM on 
the other tended to give contrast indications to many of the anti- 
neoplastic agents. In Figure S5A, to which the drug irinotecan 
belongs, drugs were mostly harmful in the IGL queries but were 
beneficial in the IG3 and most of the FMCM queries. In Figure 
S5B, the IGL queries were mostly beneficial and the IG3/FMCM 
mostly harmful. In Figure S5C and S5D, all IGCM queries (IGL 
and IG3, with one exception) were beneficial while the FMCM 
queries had significant harmful components. The most beneficial 
drug was vorinostat, the single entry in Figure S5E, where all 
queries were beneficial. The most harmful drugs were the all- 
harmful celecoxib and paclitaxel, with carmustine and imatinib 
close behind (Figure S5F). Vorinostat, doxorubicin, daunorubicin, 
irinotecan - all in Figure S5A - satisfied our stringent criteria for 
inclusion in Table 1 as components for therapeutic drug 
compounds. 

Cell viability on single predicted drugs 

Eight beneficial drugs from Table 1, phenoxybenzamine (PB; 
beneficial to 7 functional modules), GW-8510 (GW; 7), phthalyl- 
sulfathiazole (PS; 5), etacrynic acid (EA; 2), ginkgolide A (GA; 1), 
triflusal (TF; 1), imipenem (IM; 1), and 6-azathymine (6-AT; 1), 
were selected for their commercial availability and degree of 
beneficence for preliminary cell model experimental validation on 
five ceU lines: colon cancer, HCT1 16, RKO, SW403, and SW620, 
and breast cancer, MCF7. GW had the strongest effect on the cell 
lines and MCF7 was the cell line most susceptible to the tested 
drugs (Figure 7). GW, EA, GA, and 6-AT could selectively or 
broadly inhibit cell viability on the cell lines. GW, a known CDK2 
inhibitor used in protection of hair-loss in chemotherapy, 
exhibited strong inhibitions against HCT1 16 and MCF7, moder- 
ate effects against RKO and SW620, and weak effects against 
SW602. EA, GA, and 6-AT moderately to weakly inhibited the 
viability of MCF7 (Table 3). 

Microarray results and test of the perturbagen concept 

We tested the implicit perturbagen assumption [17] that CMap 
data on gene expression profiles from drug treatments on one cell 
line (MCF7) are useful for drawing inferences more generally on 
the effects of the drug, in particular, its effect on different cell lines. 
We generated five microarray global gene expression profiles, of 



PLOS ONE | www.plosone.org 



7 



January 2014 | Volume 9 | Issue 1 | e86299 



Functional Module Connectivity Map 



« o 



m m m ro 



Q 
I- 
I- 



< 



< 



I I 



.E o 

rN E 



B u 
«! £ 



OJ ro 
ro OJ 



0) »/i 

£ S 

O — 

■a 8 

o £ 



i E 



is 



ro 

ro S 



E 



ro & ^ 



o o 

■m .t: 



11 

ro ro 



' o — o 



in ^ 



=3 



<D 
in 

o 

Q. 



~ S «■ =f « 



S J. 



■D 

0 



U ~L U — < 



I i -Li 



co co a* 



B3 <5 SP 



Q. 

0) 



-g 

C 

ro 



■D 
■D 



r\i r\i rN r\l 



_0j 
J2 



s 

13 



0 

z 



c E 



E a! 



co a* o ^- 



PLOS ONE | www.plosone.org 



8 



January 2014 | Volume 9 | Issue 1 | e86299 



Functional Module Connectivity Map 



o -a 
o- 1- 



91 O 
9 at 



=5 E 

U _3 

II 
•s 3 

O 4^ 

Q. (T3 
O 
. a; 



o OJ 
.2 cr 



g 

Q 



I I 



< 



I I 



■s -° 

— n: 
.2 Q 



e = 5 s 

to aj c o 

aj Q. 113 a. 

•= 3~ ra g 

U >t S M- 

■i= ° o o 



? E ~ 



4-J CU >— 



~ cu ~ 



2 5 



E a. 



cu cD 



E & 



(U i_ i- 

4-" -M ._ 



o ~ 



.5" 33 



7 6 S O 

° o si 

c a. -° 

o _ ~ .i- 

c b = c 



E V, 

D C 

C TO 

QJ 

TO 



-Q .9- 



■D 

O 



^- <- «- 



OC ----- «—* 



UO to l/l 1/1 



QJ ~ 

it o 

CU _Q 

— TO 

q= E 

c < 

S z 

-Q DC 

I i 

01 ^ 

I s 

in 2 

JZ n 

2 E 
1 g 



■D 
■D 



C 

O 
U 



_0j 
J2 



o 
z 



S - 



€ 1 a 

S Eg 

rtJ o 

111 

o 

o 3 



01 



rN r\i r\i r\i 



_ ra c 

"S = 5 

O Q. O 

E » ^ 

1 < 5 

■| Q 5 



PLOS ONE | www.plosone.org 



9 



January 2014 | Volume 9 | Issue 1 | e86299 



Functional Module Connectivity Map 



cycloserine 



6-azathymine 

signal — - ; 
transduction^^, 



flufenamic acid 



apoptosis 



skimmianine 




pyrvimum , r|f|usa| 



immune 
system 



process > nkgolideA 



cell 
proliferation 



Line color 
"-0.8 



CD 

o' 



0 0> 
i— i- 

c/) 
o 

0.5 ° 

CD 



I 



Effective (93) 
Adverse (8) 



withaferin A 



piperlongumine 



Figure 4. Function-drug association map (FDAM) for colorectal adenoma. Nodes in the map are functional modules (FMs; gene sets) and 
drugs obtained by querying CMap using individual FMs. Drug-function links indicate beneficial (green) or harmful (red). Only drugs beneficial to at 
least one FM are included. 
doi:1 0.1 371 /journal.pone.0086299.g004 



PB and IM on HCT1 16 and MCF7, and of GW on HCT1 16 (the 
efficacy of GW on MCF7 is similar). As control we extracted from 
the CMap database 10 corresponding datasets of the three drugs 
on MCF7 (4, 4, and 2 datasets from PB, GW, and IM, 
respectively). We carried out separate two-way hierarchical 
clustering of the 15 profiles employing the IGA and GSA 
procedures. In spite of different cell lines, microarray platforms, 
and laboratory conditions, the samples, especially the five GW 
administered samples, clustered more according to drugs than not 
in both procedures (Figure 8). The outstanding qualitative 
difference between the IGA and GSA procedures was that the 
IGA heatmap was dominated by a single GO term, cell cycle 
(Figure 8A, Table S4, S5, S6), whereas the GSA heatmap was 
characterized by four terms: monosaccharide metabolic process, 
response to hormone stimulus, inflammatory response, and cell 
cycle phases (Figure 8B, Table S7, S8, S9, S10). The IGA heatmap 
corroborates the result mentioned earlier, that GW, alone among 
all the drugs, had high negative impact on cell survival. The GSA 
heatmap allowed the effects of smaller gene sets to be displayed 
and had a closer correspondence to our FMCM approach to drug 
selection. For example, it provided an independent support that 
IM, but not the other drugs tested, exhibited a strong beneficial 
effect on the immune system process in colorectal cancer cells 
(Table 1). 



Discussion 

CMap has been widely applied to drug discovery for treatment 
of complex diseases, including cancer. Methodologies employed 
for applying CMap for this purpose are simply variations of a [17]. 
The important differences between the FMCM procedure used 
here and IGCM include: (i) In addition to differential expression, 
genes selected from querying CMap were screened by the GSToP 
procedure making use of GGINs [26]; (ii) functional modules 
(FMs) were built from selected genes and used for separate 
querying of CMap to select function specific beneficial drugs 
(Table 1); (iii) querying results were used to form drug compounds, 
each compound being a minimum set of drugs that collectively 
were beneficial to all the FMs and not harmful to any FM 
(Table 2). 

Our tests showed FMCM to perform better than IGCM in 
terms of prediction stability (Figure 3), accuracy (Figure 5A), 
specificity (Figure S4), and reproducibility (Figure 5B). In short, 
FMCM was much more robust than IGCM in drugs selected. One 
reason for the relative robustness of FMCM prediction over 
IGCM may be that most modern drugs were designed to affect a 
specific biological function, say, by targeting a transcription factor, 
not to affect all functions. An FM-drug association is therefore 
expected to be more stable than that between the whole DEG set 
and the drug. 
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Table 2. Predicted drug compounds for colorectal cancer adenoma. 



Code 


No. of components 


Compound 


Ratio of degrees 


1 


2 


phenoxybenzamine + ISP 


7:1 


2 


2 


GW-8510 + ISP 


7:1 


3 


3* 


phthalylsulfathiazole + etacrynic acid + ST 


5:2:1 


4 


4 


daunorubicin + TDNA + APO + ISP 


4:2:1:1 


5 


4 


apigenin + etacrynic acid + ST + ISP 


4:2:1:1 


6 


4 


apigenin + alsterpaullone + CC + ISP 


4:2:1:1 


7 


5 


camptothecin + ifenprodil + DR + CC + ISP 


3:2:1:1:1 


8 


5 


vorinostat + etacrynic acid + TDNA + RM + ISP 


2:2:2:1:1 


9 


6 


ifenprodil + 8-azaguanine + DR + CC + ST + ISP 


2:2:1:1:1:1 


10 


6 


ifenprodil + etacrynic acid + DR + CP + CC + ISP 


2:2:1:1:1:1 


1 1 


6 


ifenprodil + alsterpaullone + DR + CP + ST + ISP 


2:2:1:1:1:1 


12 


6 


TDNA + 8-azaguanine + RM + CC + ST + ISP 


2:2:1:1:1:1 


13 


6 


TDNA + etacrynic acid + RM + CP + ST + ISP 


2:2:1:1:1:1 


14 


6 


TDNA + alsterpaullone + RM + CC + CP + ISP 


2:2:1:1:1:1 


15 


6 


TDNA + vorinostat + RM + CP + APO + ISP 


2:2:1:1:1:1 


16 


6 


vorinostat + etacrynic acid + DR + TR + RM + ISP 


2:2:1:1:1:1 


17 


6 


vorinostat + ifenprodil + DR + CC + APO + ISP 


2:2:1:1:1:1 


18 


6 


ifenprodil + 8-azaguanine + DR + CC + ST+ ISP 


2:2:1:1:1:1 


19 


6 


ifenprodil + alsterpaullone + DR + CC + CP+ ISP 


2:2:1:1:1:1 


20 


6 


ifenprodil + etacrynic acid + DR + CP + ST+ ISP 


2:2:1:1:1:1 



The eight GO terms (biological function classifications) included are APO, CC, CP, ST, TR, DR, CP, RM and ISP (see abbreviations below). All drugs in the table have ES 
(enrichment score) < — 0.75 (with only one exception). None of the drugs have harmful effects (ES >0) on any of the GO functions. Only compounds with up to 6 
components are given. Abbreviations: ISP: immune system process - trifusal or morantel or gingolide or cetirizine or imipenem; APO: apoptosis - irinotecan or 
doxazosin or cycloserine or repaglinide; CC: cell cycle - doxorubicin or withaferin A; ST: signal transduction - 6-azathymine or tyloxapol; TR: transcription - sanguinarine; 
DR: DNA replication - piperlongumine; CP: Cell proliferation - bepridil; RM: RNA metabolic process - skimmianine; TDNA: transcription and DNA replication -chrysin or 
thioguanosine or luteolin or thiostrepton or sulconazole. The "degrees" in "Ratio of degrees" indicate the number of functional modules to which the corresponding 
component is beneficial. 
doi:1 0.1 371 /journal.pone.0086299.t002 



Because IGCM does not test drugs for function specific 
harmfulness, drugs selected by IGCM may be deemed overall 
beneficial yet still harmful to some functions, or vice versa. This 
was the case for three well-known drugs used in the cancer 
chemotherapy listed in the CMap database: thapsigargin, 8- 
azaguanine, and irinotecan (Figure 6). 

Drugs selected by FMCM had high sensitivity and low false- 
positive rate. Of the total 46 candidate drugs (Table 1), 41 were 
entirely beneficial and five were harmful to some FMs. Thirty of 
46 drugs have been reported in the literature to have properties 
related to cancer. Of the 41 putative entirely beneficial drugs, 25 
have been reported to have anticancer properties and one has 
been reported to be carcinogenic (or mutagenic). Of the five 
putative partly harmful drugs, four have been reported to be both 
anticancer and carcinogenic. 

Phenoxybenzamine, an Ol-adrenergic-antagonist, was our only 
false-positive case. We identified it as a degree-7 beneficial drug, 
but there has been no literature suggesting its anti-tumorigenicity. 
Instead, there are two clinical studies suggesting possible 
carcinogenic effects in patients [40,41]. 

A novel feature of the FMCM approach was its ability to 
discover intracellular harmful side effect in agents known to be 
anti-cancer. This is not surprising given the widespread practice of 
targeting specific biomolecule in drug designs when it is now 
known that the typical regulatory relation between transcription 
factors and biological networks is many-to-many. We identified 
pyrvinium, trifluoperazine, ellipticine, and 02974 17-0002B to be 



harmful to immune system process but otherwise beneficial. The 
first three - there is no literature on 02974 17-0002B - has been 
reported to have anti-tumor properties but also have lethal effects 
in cell or animal models (Table 1). We identified thapsigargin to be 
beneficial to the apoptosis FM but broadly harmful to the signal 
transduction, transcription, cell proliferation, and RNA metabolic 
process FMs. This drug has been reported to have the ability to 
promote apoptosis on prostate and breast cancer cells but also to 
stimulate cell growth in mouse keratinocyte models (Table 1) [42— 
47]. 

Based on our in silico screening and cell viability experiments, we 
selected the four drugs, GW-8510 (GW, no. 2 in Table 1), 
etacrynic acid (EA, no. 16), ginkgolide A (GA, no. 39), and 6- 
azathymine (6-AT, no. 44), as potential therapeutic agents for 
colorectal adenoma. GW, which exhibited clear inhibitory effects 
against colon cell growth (Figure 7) in our viability experiments, is 
a cyclin-dependent kinase 2 (CDK2) inhibitor used for preventing 
hair loss in chemotherapy. It was suggested that the observed 
antitumor efficacy of GW's derives from its inhibition of tumor 
growth via cell cycle control [48] , a suggestion supported by our in 
silico study. EA is a potent inhibitor of glutathione ^-transferase 
(GST) family members and has been used to treat high blood 
pressure and swelling caused by kidney failure. It has been 
suggested that EA may inhibit cell growth and induce cancer cell 
death through apoptosis [49-51], a notion that correlated well 
with our findings (Figure 4). In our analysis the ES's for EA were 
— 0.891 and —0.875 versus the apoptosis and cell cycle modules, 
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Figure 5. Accuracy and reproducibility in drug prediction. (A) Accuracy is the sum of true positive (predicted beneficial and known anti-tumor 
agent) and true negative (predicted harmful and known cancer-inducing agent) over sum of predicted beneficial and harmful drugs. IGCM results are 
in black, and FMCM, in red and cyan. Specificity is given in Figure S5. (B) Reproducibility is the measure of agreement between the selected drugs in 
two runs using different subsets of microarray data (Materials and Methods). Results shown are averaged over 45 pair-wise comparisons of selected 
drugs. The five towers on the left are IGCM results for given threshold FC value. The eight towers on the right are FMCM results (FC >2) for the 8 
functional modules. Size of querying gene set is given by line in red. 
doi:1 0.1 371 /journal.pone.0086299.g005 
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Figure 6. Enrichment scores of known anti-cancer drugs. (A) irinotecan, (B) thapsigargin, (C) 8-azaguanine, and (D) vorinostat. CMap querying 
gene sets are shown on the horizontal axis. The first five entries from left are whole DEG sets selected by SAM using FDR = 0.01 and FC ranging from 
3.0 to 5.0. The rest are the eight functional module selected by GSToP with FC = 2.0. Star indicates permutation p-value<0.005. 
doi:1 0.1 371 /journal.pone.0086299.g006 



respectively (Table 1). EA has also been reported to have 
therapeutic potential in cancer therapy by reversing drug 
resistance [52-54]. GA, a Ginkgo biloba leaf extract, has been used 
for treatment in a wide variety of cognitive and vascular disorders, 
including dementia and peripheral arterial occlusive diseases [55- 
57]. Its structural homolog Ginkgolide B has been reported to 
possess anti-inflammatory anti-allergic, anti-oxidant, anti-cancer, 
and neuroprotective effects [58]. In our analysis GA had 
ES = — 0.834 against the immune system process module 
(Table 1). Recent studies conducted with various molecular, 
cellular, and animal model experiments have concluded that 
Ginkgolide B may have chemopreventive abilities associated with 
anti-angiogenic, antioxidant, and gene-regulatory events [59-62]. 
6-AT, an analog of thymine, has been shown to inhibit the 
pathway of biosynthesis of nuclear acids in cancer tissues [63,64]. 
It had ES = — 0.813 against the signal transduction module 



(Table 1) and inhibited tumor growth in the cell line MCF7 
(Figure 7). Our study showed that the pair GW and GA combined 
would yield beneficial effects on all eight FMs (Table 2). 

We were not able to find another gene expression microarray 
set on colon adenoma that passed our quality test (Principal 
Component Analysis, Figure SI). We did find a dataset on 
colorectal cancer (CRC) that did, from GEO database (accession 
number: GSE32323). The dataset, Affymetrix HG-U133 plus 2.0 
arrays, contained 1 7 pairs of cancer and healthy colon tissues from 
CRC patients [65] . We subjected the set to the same IGCM and 
FMCM analyses as we did the adenoma data. For the predicted 
repurposed candidate drugs, FMCM performed better than 
IGCM in stability and reproducibility, and was comparable with 
IGCM in accuracy and specificity. Through FMCM we selected 
43 candidate drugs for CRC. Among this drug set, 20 also 
belonged to the set of 46 drugs for adenoma (Figure 9). This 
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Figure 7. Viability test of colon and breast cancer cells treated with single drug. Tests were conducted on eight drugs: phenoxybenzamine 
(PB), GW-8510, phthalylsulfathiazole (PS), etacrynic acid (EA), ginkgolide A (GA), triflusal (TF), imipenem (IM), and 6-azathymine (6-AT), with 
concentrations of 0, 0.1, 1, 10, and 30 \iM. (A) Viability of MCF7 on treatment of the eight drugs. (B) Viability of five cell lines on treatment of GW-8510. 
Colon cancer cells HCT116, RKO, SW403 and SW620, and the breast cancer cell MCF7, were treated with single drug for 5 days. After 5 days, 
proliferation activities of these cells were detected by Alamar Blue assay. 
doi:1 0.1 371 /journal.pone.0086299.g007 

drugs selected for cell viability tests. This is significant because the 
tests were conducted prior to the analysis of CRC data. 

There are limitations inherent to our approach. It depends on 
the availability of gene expression profiles of drugs; many FDA- 
approved drugs are not among the more than 1,000 compounds 
with profiles given in CMap database 2.0 release, the version used 
in this work; the implicit perturbagen assumption that the drug- 
specific genomic profiles given by CMap are only mildly 
dependent on the specific (mostly breast cancer) cell lines requires 
validation in every application; the reliability of our results 
depends on the quality and accuracies of patient gene expression 
profiles, PPI, and GO data; and, specific to this work, how the 
drug screening results may depend on the selection of FMs for 
drug screening has not been thoroughly studied. 

Our results need further experimental validation because 
therapeutic efficacy of a drug is always more complex than just 
a simple matching of expression profiles, even when conducted in 
the refined fashion of FMCM. In addition, not known at this stage 
are how components in a compound (Table 2) would interact with 
each other and how the interaction would impact its predicted 
property. Such effects can only be assessed in animal model tests 
when the selected drugs are applied in compound form. 

These limitations are common to all CMap-based or similar 
drug discovery methods, yet our method has merits not possessed 
by other screening methods. We believe our method is useful for 
drug discovery for therapeutic systems treatment not only for 
colorectal adenoma, but also for other types of cancer as well as for 
other complex diseases and conditions. 



overlap set of 20 drugs included 9 of the 1 3 adenoma drugs with 
degrees equal to or greater than 3 (Table 1). Given that IGCM- 
based drug screening for disease specific but different data sets is 
known to lead to drug sets that are highly variable, and that colon 
adenoma and CRC are related but not the same conditions, the 
degrees of overlap between the colon adenoma and CRC drug sets 
are encouraging. The overlap set contained 5 of the 8 adenoma 



Table 3. Inhibitory effects of single predicted drugs on colon 
cancer and breast cancer cell lines. 



Half maximal inhibitory concentration (IC50) (|iM) 

Drugs HCT116 RKO SW403 SW620 MCF7 

phenoxybenzamine - - 

GW-8510 0.7 3.3 >30 8.4 0.8 

phthalylsulfathiazole - - 

etacrynic acid - - 6.80 

ginkgolide A - - 22.5 

triflusal - - 

imipenem - - 

6-azathymine - - - - 7.9 

-: not detected from 0.1 to 30 fiM. 
doi:1 0.1 371 /journal.pone.0086299.t003 
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oxidative phosphorylation 
(p = 8.97x10 4 ) 



cell cycle 

(p = 2.76 x 10 - 20 ) 



response to protein stimulus 
(p = 9.37x10- 8 ) 



monosaccharide metabolic 
process 

(p = 2.07x10- 31 ) 

response to hormone stimulus 
(p= 1.79x10- 9 ) 

inflammatory response 
(p = 2.25x10 11 ) 

I cell cycle phase 
(p = 7.80x10 " 3 ) 



Figure 8. Clustering of genomic profiles of drug-treated cancer cell lines HCT116 and MCF7. (A) Individual gene approach (IGA). (B) 
Gene-set approach (GSA). Cell lines were treated with three drugs: GW-8510, phenoxybenzamine (PB), and imipenem. Entries marked "cmap" were 
microarray drug treatment genomic profiles of MCF7 taken from the CMap. Others were from drug treatment microarray experiments (Affymetrix 
U219 (PrimeView) platform) conducted for the present study, where the same experimental protocol used in CMap were followed: averaged over 
three dosages of 10 M, 1 1.8 M, 13.4 M; treatment time 6 hours after overnight culture. Heatmaps were results of two-way hierarchical clustering. 
doi:1 0.1 371 /journal.pone.0086299.g008 
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Figure 9. Overlap of candidate repurposed drug sets curated 
from colon adenoma and colorectal cancer data sets. Numbers 
in brackets correspond to those given in the first column of Table 1, 
which lists the drug set for colon adenoma. The overlap includes 9 of 
the 13 drugs in Table 1 with degrees not less than 3, and 5 of the 8 
drugs selected for cell viability tests marked by "*". 
doi:1 0.1 371 /journal.pone.0086299.g009 

Supporting Information 

Figure SI Assessment of chip quality and sample 
classification of colorectal adenocarcinoma paired pa- 
tients. (A) Principle Component Analysis. The first component, 
with about 35% variances in data, was sufficient to correctly 
partition the samples, 32 of adenoma patients and their 32 
parental normal tissues, into two groups. (B) Two-way hierarchical 
clustering analysis. On the side, 2,164 differentially expressed 
genes (DEGs) were used for clustering. The DEGs, including 964 
up-regulated and 1,200 down-regulated genes, were selected by 
the SAM algorithm with thresholds for false discovery rate (FDR) 
<0.05 and fold change (FC) >2. On the top, green bar covers the 
normal tissue and blue bar, adenocarcinoma. 
(TIF) 

Figure S2 The gene-gene networks constructed using 
gene expression data from normal and colorectal 
adenoma patients. There are 1,684 genes and 2,017 links in 
the normal network, and 1,792 genes and 2,656 links in the 
adenoma network. Genes assigned to over-represented biological 
Gene Ontology terms are highlighted in term specific color. 
(TIF) 

Figure S3 Drugs selected by standard application of 
CMap, or IGCM, using different fold change (FC) 
thresholds. Sum of up- and down-regulated genes given under 
each FC threshold constituted the querying gene set. Drugs listed 
are those predicted to be beneficial. Red arrow indicates known 
TTD anti-cancer agents that coincidentally all changed from 
beneficial at FC = 3 to harmful at FC = 3.5. Vorinostat was the 
only drug selected at FC >3, 3.5, 4.0, and 4.5; it was also selected 
in the FMCM procedure. 
(TIF) 

Figure S4 Specificity of predicted drugs. Specificity is true 
negative (known cancer-inducing agent predicted to be harmful) 
over all drugs predicted to be harmful; higher specificity implies 
lower false positive. Seven of the eight FMCM results (red), except 



immune systems process (cyan), have higher specificities than the 

five IGCM results (black). 

(TIF) 

Figure S5 Enrichment scores of 27 chemo-drugs. The 27 

chemo-drugs, selected from the L01 class (antineoplastic agents) in 
the Anatomical Therapeutic Chemical system, are not specific to 
colon cancer treatment. The ES is those from five IGCM (FC 
threshold 3 to 5) and eight FMCM runs (FC >0.2). Solid symbol 
indicates an ES with permutation/; value<0. 05. The 27 drugs are 
clustered into six groups according to overall pattern. 
(TIF) 

Table SI Gene ontology enrichment analysis for func- 
tional modules. 

(XLS) 

Table S2 Gene signature tags used in the FMCM 
program. 

(XLS) 

Table S3 References listed in Table 1. 

(XLS) 

Table S4 GO terms analysis for genes in the lightblue 
block in the IGA heatmap (Figure 8A). Top- 10 gene 
ontology annotation clusters were determined by DAVID [36] . 
(XLS) 

Table S5 GO terms analysis for genes in the pink block 
in the IGA heatmap (Figure 8A). Top- 10 gene ontology 
annotation clusters were determined by DAVID [36] . 
(XLS) 

Table S6 GO terms analysis for genes in the purple 
block in the IGA heatmap (Figure 8A). Top- 10 gene 
ontology annotation clusters were determined by DAVID [36] . 
(XLS) 

Table S7 GO terms analysis for genes in the green 
block in the GSA heatmap (Figure 8B). Top- 10 gene 
ontology annotation clusters were determined by DAVID [36] . 
(XLS) 

Table S8 GO terms analysis for genes in the blue block 
in the GSA heatmap (Figure 8B). Top- 10 gene ontology 
annotation clusters were determined by DAVID [36] . 
(XLS) 

Table S9 GO terms analysis for genes in the orange 
block in the GSA heatmap (Figure 8B). Top- 10 gene 
ontology annotation clusters were determined by DAVID [36] . 
(XLS) 

Table S10 GO terms analysis for genes in the purple 
block in the GSA heatmap (Figure 8B). Top- 10 gene 
ontology annotation clusters were determined by DAVID [36] . 
(XLS) 
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